AITopics

Country: Europe (0.28)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Neural Information Processing SystemsFeb-11-2026, 13:16:32 GMT

14da15db887a4b50efe5c1bc66537089-AuthorFeedback.pdf

We are grateful for all the reviewers' valuable suggestions and questions. The results are displayed in Figure 1. " stands for equality up to zero-valued paddings. ICLR2019), but with the top layer to be zero. We will clarify this in the revised version.

artificial intelligence, initialization, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Neural Information Processing SystemsOct-2-2025, 04:41:27 GMT

14da15db887a4b50efe5c1bc66537089-AuthorFeedback.pdf

artificial intelligence, initialization, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

arXiv.org Artificial IntelligenceMar-17-2025

HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model

Guo, Haiyang, Zeng, Fanhu, Xiang, Ziwei, Zhu, Fei, Wang, Da-Han, Zhang, Xu-Yao, Liu, Cheng-Lin

Instruction tuning is widely used to improve a pre-trained Multimodal Large Language Model (MLLM) by training it on curated task-specific datasets, enabling better comprehension of human instructions. However, it is infeasible to collect all possible instruction datasets simultaneously in real-world scenarios. Thus, enabling MLLM with continual instruction tuning is essential for maintaining their adaptability. However, existing methods often trade off memory efficiency for performance gains, significantly compromising overall efficiency. In this paper, we propose a task-specific expansion and task-general fusion framework based on the variations in Centered Kernel Alignment (CKA) similarity across different model layers when trained on diverse datasets. Furthermore, we analyze the information leakage present in the existing benchmark and propose a new and more challenging benchmark to rationally evaluate the performance of different methods. Comprehensive experiments showcase a significant performance improvement of our method compared to existing state-of-the-art methods. Our code will be public available.

artificial intelligence, large language model, natural language, (18 more...)

2503.12941

Country: Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Allison, Katherine, Kelly, Jonathan, Hatton, Benjamin

Structured Pneumatic Fingerpads for Actively Tunable Grip Friction

arXiv.org Artificial IntelligenceFeb-2-2025

Grip surfaces with tunable friction can actively modify contact conditions, enabling transitions between higher- and lower-friction states for grasp adjustment. Friction can be increased to grip securely and then decreased to gently release (e.g., for handovers) or manipulate in-hand. Recent friction-tuning surface designs using soft pneumatic chambers show good control over grip friction; however, most require complex fabrication processes and/or custom gripper hardware. We present a practical structured fingerpad design for friction tuning that uses less than \$1 USD of materials, takes only seconds to repair, and is easily adapted to existing grippers. Our design uses surface morphology changes to tune friction. The fingerpad is actuated by pressurizing its internal chambers, thereby deflecting its flexible grip surface out from or into these chambers. We characterize the friction-tuning capabilities of our design by measuring the shear force required to pull an object from a gripper equipped with two independently actuated fingerpads. Our results show that varying actuation pressure and timing changes the magnitude of friction forces on a gripped object by up to a factor of 2.8. We demonstrate additional features including macro-scale interlocking behaviour and pressure-based object detection.

artificial intelligence, fingerpad, friction, (15 more...)

2502.00926

Country:

North America > Canada > Ontario > Toronto (0.15)
Asia > China (0.04)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.68)

arXiv.org Artificial IntelligenceFeb-2-2025

Harnessing Discrete Differential Geometry: A Virtual Playground for the Bilayer Soft Robotics

Li, Jiahao, Tong, Dezhong, Hao, Zhuonan, Zhu, Yinbo, Wu, Hengan, Liu, Mingchao, Huang, Weicheng

Robotics is the science of designing and constructing machines capable of movement, perception, and cognition to assist humans in performing various tasks. Inspired by living organisms, using soft matter in robot design has gained significant attention in recent decades. The inherent compliance of soft bodies allows them to adapt to complex environments, enabling innovative applications in fields such as healthcare, agriculture, and the food industry [1-10]. Given the potential of soft robots, various functional materials, such as liquid crystal elastomers, pneumatic actuators, and light-driven systems, have been explored as actuators due to their ability to deform in response to diverse external stimuli. However, the intrinsic compliance and nonlinearity of soft materials pose significant challenges in achieving precise and effective deformation control, which limits their practical effectiveness in real-world applications. A widely adopted approach to addressing this challenge is using bilayer structures in soft robot design. Inspired by natural phenomena such as the opening of pea pods, a bilayer structure consists of two layers--an top and a bottom layer--adhered at their interface [11], as illustrated in Figure 1A. When one layer undergoes expansion, a mismatch strain arises at the interface.

artificial intelligence, bilayer structure, robot, (18 more...)

2502.00714

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre:

Research Report (0.40)
Overview (0.34)

Industry:

Materials > Chemicals (0.54)
Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

arXiv.org Artificial IntelligenceDec-4-2024

Hybrid deep learning-based strategy for the hepatocellular carcinoma cancer grade classification of H&E stained liver histopathology images

Deshpande, Ajinkya, Gupta, Deep, Bhurane, Ankit, Meshram, Nisha, Singh, Sneha, Radeva, Petia

Hepatocellular carcinoma (HCC) is a common type of liver cancer whose early-stage diagnosis is a common challenge, mainly due to the manual assessment of hematoxylin and eosin-stained whole slide images, which is a time-consuming process and may lead to variability in decision-making. For accurate detection of HCC, we propose a hybrid deep learning-based architecture that uses transfer learning to extract the features from pre-trained convolutional neural network (CNN) models and a classifier made up of a sequence of fully connected layers. This study uses a publicly available The Cancer Genome Atlas Hepatocellular Carcinoma (TCGA-LIHC)database (n=491) for model development and database of Kasturba Gandhi Medical College (KMC), India for validation. The pre-processing step involves patch extraction, colour normalization, and augmentation that results in 3920 patches for the TCGA dataset. The developed hybrid deep neural network consisting of a CNN-based pre-trained feature extractor and a customized artificial neural network-based classifier is trained using five-fold cross-validation. For this study, eight different state-of-the-art models are trained and tested as feature extractors for the proposed hybrid model. The proposed hybrid model with ResNet50-based feature extractor provided the sensitivity, specificity, F1-score, accuracy, and AUC of 100.00%, 100.00%, 100.00%, 100.00%, and 1.00, respectively on the TCGA database. On the KMC database, EfficientNetb3 resulted in the optimal choice of the feature extractor giving sensitivity, specificity, F1-score, accuracy, and AUC of 96.97, 98.85, 96.71, 96.71, and 0.99, respectively. The proposed hybrid models showed improvement in accuracy of 2% and 4% over the pre-trained models in TCGA-LIHC and KMC databases.

artificial intelligence, deep learning, machine learning, (18 more...)

2412.03084

Country:

Asia > India (0.24)
North America > United States (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Africa (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-23-2024

Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction

Walker, Nicholas

Causal decoder-only transformer models used for generative language modelling, such as Generative Pre-trained Transformers (GPT), are trained to predict the next token in a sequence based only on its previous tokens. Despite this simple training objective, they have proved to be powerful AI tools. However, only predicting the next token results in top layer embedding vectors that are highly token-focused. There may be benefits in generating embedding vectors at each token position that better capture the overall meaning of longer sequences of future text. Recent studies matching brain scans with deep language models suggest that humans also predict upcoming words when listening or reading but consider multiple future tokens rather than just one. This research investigates a new pretraining method called Future Token Prediction (FTP). In FTP, a large transformer encoder generates top layer embedding vectors for each token position, which, instead of being passed to a language head, are linearly and expansively projected to a pseudo-sequence, which is cross attended to by a small transformer decoder to predict the next N tokens forward from that position in the sequence. The top layer embedding vectors from FTP models exhibit distinct properties compared to those from standard GPT models, varying smoothly along a text sequence as measured by cosine similarity between adjacent tokens. Text generated by FTP models show improved topic coherence compared to standard GPT-like models trained with the same prediction perplexity for the next single token. The vectors are shown to better represent the topic of text based on the results of text classification examples. On a toy, but complex, coding problem, FTP networks produce significantly better results than GPT networks.

decoder, large language model, machine learning, (19 more...)

2410.1816

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Technology (0.48)
Health & Medicine > Diagnostic Medicine > Imaging (0.48)
Leisure & Entertainment > Games > Computer Games (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)

Borah, Jutika, Sarmah, Kumaresh, Singh, Hidam Kumarjit

Disease Classification and Impact of Pretrained Deep Convolution Neural Networks on Diverse Medical Imaging Datasets across Imaging Modalities

arXiv.org Artificial IntelligenceSep-2-2024

Imaging techniques such as Chest X-rays, whole slide images, and optical coherence tomography serve as the initial screening and detection for a wide variety of medical pulmonary and ophthalmic conditions respectively. This paper investigates the intricacies of using pretrained deep convolutional neural networks with transfer learning across diverse medical imaging datasets with varying modalities for binary and multiclass classification. We conducted a comprehensive performance analysis with ten network architectures and model families each with pretraining and random initialization. Our finding showed that the use of pretrained models as fixed feature extractors yields poor performance irrespective of the datasets. Contrary, histopathology microscopy whole slide images have better performance. It is also found that deeper and more complex architectures did not necessarily result in the best performance. This observation implies that the improvements in ImageNet are not parallel to the medical imaging tasks. Within a medical domain, the performance of the network architectures varies within model families with shifts in datasets. This indicates that the performance of models within a specific modality may not be conclusive for another modality within the same domain. This study provides a deeper understanding of the applications of deep learning techniques in medical imaging and highlights the impact of pretrained networks across different medical imaging datasets under five different experimental settings.

dataset, highest performance, random initialization, (15 more...)

2408.17011

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > India (0.04)
Africa (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMar-13-2024, 15:11:56 GMT

Training and Analyzing Deep Recurrent Neural Networks

Time series often have a temporal hierarchy, with information that is spread out over multiple time scales. Common recurrent neural networks, however, do not explicitly accommodate such a hierarchy, and most research on them has been focusing on training algorithms rather than on their basic architecture. In this paper we study the effect of a hierarchy of recurrent neural networks on processing time series. Here, each layer is a recurrent network which receives the hidden state of the previous layer as input. This architecture allows us to perform hierarchical processing on difficult temporal tasks, and more naturally capture the structure of time series. We show that they reach state-of-the-art performance for recurrent networks in character-level language modeling when trained with simple stochastic gradient descent. We also offer an analysis of the different emergent time scales.

contribution, drnn-ao, recurrent neural network, (14 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Oregon (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)